156 research outputs found
Network Recasting: A Universal Method for Network Architecture Transformation
This paper proposes network recasting as a general method for network
architecture transformation. The primary goal of this method is to accelerate
the inference process through the transformation, but there can be many other
practical applications. The method is based on block-wise recasting; it recasts
each source block in a pre-trained teacher network to a target block in a
student network. For the recasting, a target block is trained such that its
output activation approximates that of the source block. Such a block-by-block
recasting in a sequential manner transforms the network architecture while
preserving the accuracy. This method can be used to transform an arbitrary
teacher network type to an arbitrary student network type. It can even generate
a mixed-architecture network that consists of two or more types of block. The
network recasting can generate a network with fewer parameters and/or
activations, which reduce the inference time significantly. Naturally, it can
be used for network compression by recasting a trained network into a smaller
network of the same type. Our experiments show that it outperforms previous
compression approaches in terms of actual speedup on a GPU.Comment: AAAI 2019 Oral presentation, source codes are available on github:
https://github.com/joonsang-yu/Network-Recastin
Retrospective: A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing
Our ISCA 2015 paper provides a new programmable processing-in-memory (PIM)
architecture and system design that can accelerate key data-intensive
applications, with a focus on graph processing workloads. Our major idea was to
completely rethink the system, including the programming model, data
partitioning mechanisms, system support, instruction set architecture, along
with near-memory execution units and their communication architecture, such
that an important workload can be accelerated at a maximum level using a
distributed system of well-connected near-memory accelerators. We built our
accelerator system, Tesseract, using 3D-stacked memories with logic layers,
where each logic layer contains general-purpose processing cores and cores
communicate with each other using a message-passing programming model. Cores
could be specialized for graph processing (or any other application to be
accelerated).
To our knowledge, our paper was the first to completely design a near-memory
accelerator system from scratch such that it is both generally programmable and
specifically customizable to accelerate important applications, with a case
study on major graph processing workloads. Ensuing work in academia and
industry showed that similar approaches to system design can greatly benefit
both graph processing workloads and other applications, such as machine
learning, for which ideas from Tesseract seem to have been influential.
This short retrospective provides a brief analysis of our ISCA 2015 paper and
its impact. We briefly describe the major ideas and contributions of the work,
discuss later works that built on it or were influenced by it, and make some
educated guesses on what the future may bring on PIM and accelerator systems.Comment: Selected to the 50th Anniversary of ISCA (ACM/IEEE International
Symposium on Computer Architecture), Commemorative Issue, 202
Worst Case Execution Time Analysis for Synthesized Hardware
Abstract -We propose a hardware performance estimation flow for fast design space exploration, based on worst-case execution time analysis algorithms for software analysis. Test cases on some real-world applications show that our flow provides a tight upper bound of the execution time, and many useful hints to the designer
Partial Bus-Invert Coding for Power Optimization of Application-Specific Systems
This paper presents two bus coding schemes for power optimization
of application-specific systems: Partial Bus-Invert coding and its
extension to Multiway Partial Bus-Invert coding. In the first scheme, only
a selected subgroup of bus lines is encoded to avoid unnecessary inversion
of relatively inactive and/or uncorrelated bus lines which are not included
in the subgroup. In the extended scheme, we partition a bus into multiple
subbuses by clustering highly correlated bus lines and then encode each
subbus independently. We describe a heuristic algorithm of partitioning a
bus into subbuses for each encoding scheme. Experimental results for various
examples indicate that both encoding schemes are highly efficient for
application-specific systems
Continuum-based design sensitivity analysis and optimization of nonlinear shell structures using meshfree method
A continuum-based shape and configuration design sensitivity analysis (DSA) method for a finite deformation elastoplastic shell structure has been developed. Shell elastoplasticity is treated using the projection method that performs the return mapping on the subspace defined by the zero-normal stress condition. An incrementally objective integration scheme is used in the context of finite deformation shell analysis, wherein the stress objectivity is preserved for finite rotation increments. The material derivative concept is used to develop a continuum-based shape and configuration DSA method. Significant computational efficiency is obtained by solving the design sensitivity equation without iteration at each converged load step using the same consistent tangent stiffness matrix. Numerical implementation of the proposed shape and configuration DSA is carried out using the meshfree method. The accuracy and efficiency of the proposed method is illustrated using numerical examples
DESIGN SENSITIVITY ANALAYSIS OF NONLINEAR SHELL STRUCTURE WITH FRICTIONLESS CONTACT
A continuum-based shape and configuration design sensitivity analysis method for a finite deformation elastoplastic shell structure with frictionless contact has been developed. Shell elastoplasticity is treated based on the projection method that performs the return mapping on the subspace defined by the zero-normal stress condition. An incrementally objective integration scheme is used in the context of finite deformation shell analysis, wherein stress objectivity is preserved for finite rotation increments. The penalty regularization method is used to approximate the contact variational inequality. The material derivative concept is used to develop continuum based design sensitivity. The design sensitivity equation is solved without iteration at each converged load step. Numerical implementation of the proposed shape and configuration design sensitivity analysis is carried out using the meshfree method. The accuracy and efficiency of the proposed method is illustrated using numerical examples
- โฆ